Pronunciation modeling by sharing gaussian densities across phonetic models
نویسندگان
چکیده
Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. There have been many attempts to model pronunciation variation, including the use of decision-trees to generate alternate word pronunciations from phonemic baseforms. Use of such pronunciation models during recognition is known to improve accuracy. This paper describes the use of such pronunciation models during acoustic model training. Subtle difficulties in the straightforward use of alternatives to canonical pronunciations are first illustrated: it is shown that simply improving the accuracy of the phonetic transcription used for acoustic model training is of little benefit. Analysis of this paradox leads to a new method of accommodating nonstandard pronunciations: rather than allowing a phoneme in the canonical pronunciation to be realized as one of a few distinct alternate phones predicted by the pronunciation model, the HMM states of the phoneme’s model are instead allowed to share Gaussian mixture components with the HMM states of the model of the alternate realization. Qualitatively, this amounts to making a soft decision about which surface-form is realized. Quantitative experiments on the Switchboard corpus show that this method improves accuracy by 1.7% (absolute).
منابع مشابه
Pronunciation variation speech recognition without dictionary modification on sparse database
Generally, a speech recognition system uses a fixed set of pronunciations according to the dictionary for training and decoding. However, even a well-defined lexicon cannot be used to support all variations in human’s pronunciation. Besides, in order to cover all possible pronunciations, the size of the dictionary would be too large to implement. Sharing gaussian densities across phonetic model...
متن کاملModeling Cantonese pronunciation variation by acoustic model refinement
Pronunciation variations can be roughly classified into two types: a phone change or a sound change [1][2]. A phone change happens when a canonical phone is produced as a different phone. Such a change can be modeled by converting the baseform (standard) phone to a surfaceform (actual) phone. A sound change happens at a lower, phonetic or subphonetic level within a phone and it cannot be modele...
متن کاملPronunciation modeling by sharing Gaussians
Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. There have been many attempts to model pronunciation variation, including the use of decision trees to generate alternate word pronunciations from phonemic baseforms. Use of pronunciation models during recognition is known to i...
متن کاملSubphonetic Modeling for Speech Recognition
How to capture important acoustic clues and estimate essential parameters reliably is one of the central issues in speech recognition, since we will never have sufficient training data to model various acoustic-phonetic phenomena. Successful examples include subword models with many smoothing techniques. In comparison with subword models, subphonetic modeling may provide a finer level of detail...
متن کاملA study of implicit and explicit modeling of coarticulation and pronunciation variation
In this paper, we focus on the modeling of coarticulation and pronunciation variation in Automatic Speech Recognition systems (ASR). Most ASR systems explicitly describe these production phenomena through context-dependent phoneme models and multiple pronunciation lexicons. Here, we explore the potential benefit of using feature spaces covering longer time segments in terms of implicit modeling...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 14 شماره
صفحات -
تاریخ انتشار 1999